An Empirical Analysis of Source Context Features for Phrase-Based Statistical Machine Translation

نویسنده

Marion Weller

چکیده

Statistical phrase-based machine translation systems make only little use of context information: while the language model takes into account target side context, context information on the source side is typically not integrated into phrase-based translation systems. Translational features such as phrase translation probabilities are learned from phrase-translation pairs extracted from word-aligned parallel corpora. Since there is no information besides the co-occurrence frequencies of the phrase-translation pairs, all occurrences of a given source phrase are used for the estimation of translation probabilities, regardless of their contexts in the training data. However, information about the context of a source phrase, e.g. adjacent words or part-of-speech tags, might be a valuable resource for the identification of appropriate translations in a given context. In this work, we want to analyze the use of source side context features in phrase-based statistical machine translation. For every phrase in an input sentence, context-sensitive phrase translation probabilities will be estimated: by reducing the set of all phrase-translation pairs to the subset of those with the same context as the given phrase, we can compute individual translation probabilities depending on the respective context. Assuming that the different translations of ambiguous source phrases occur within different contexts, contextually conditioned translation probabilities might help to solve ambiguities by separating the entire set of translation candidates into subsets appropriate for different situations. However, the more refined probability estimates should also have a general positive influence on translation quality. Furthermore, the integration of context features offers the possibility to include linguistic information which is not used in standard statistical machine translation. In our experiments, which are conducted on an English to German translation system, we will focus on the integration of local context features, choosing a simple method for the computation of contextually conditioned phrase-translation probabilities and their incorporation into a standard phrase-based statistical translation system. For all experiments, we will provide an extensive evaluation of the overall translation quality using standard automatic metrics such as bleu, but also attempt to individually rate fluency and adequacy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supertags as Source Language Context in Hierarchical Phrase-Based SMT

Statistical machine translation (SMT) models have recently begun to include source context modeling, under the assumption that the proper lexical choice of the translation for an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features have been explored as effective source context to improve phrase selection in SMT. In the present w...

متن کامل

Tree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering

Structured syntactic knowledge is important for phrase reordering. This paper proposes using convolution tree kernel over source parse tree to model structured syntactic knowledge for BTG-based phrase reordering in the context of statistical machine translation. Our study reveals that the structured syntactic features over the source phrases are very effective for BTG constraint-based phrase re...

متن کامل

Dependency Relations as Source Context in Phrase-Based SMT

The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. ...

متن کامل

Rich Source-Side Context for Statistical Machine Translation

We explore the augmentation of statistical machine translation models with features of the context of each phrase to be translated. This work extends several existing threads of research in statistical MT, including the use of context in example-based machine translation (Carl and Way, 2003) and the incorporation of word sense disambiguation into a translation model (Chan et al., 2007). The con...

متن کامل

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

An Empirical Analysis of Source Context Features for Phrase-Based Statistical Machine Translation

نویسنده

چکیده

منابع مشابه

Supertags as Source Language Context in Hierarchical Phrase-Based SMT

Tree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering

Dependency Relations as Source Context in Phrase-Based SMT

Rich Source-Side Context for Statistical Machine Translation

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

عنوان ژورنال:

اشتراک گذاری